Using MMIL for the High Level Semantic Annotation of the French MEDIA Dialogue Corpus
نویسندگان
چکیده
The MultiModal Interface Language formalism (MMIL) has been selected as the High Level Semantic (HLS) formalism for annotating the French MEDIA dialogue corpus. This corpus is composed of human-machine dialogues in the domain of hotel reservation and tourist information. Utterances in dialogues have been previously annotated with a concept-value flat semantics for studying and evaluating spoken language understanding modules in dialogue systems. We are now interested in investigating the use of more complex representations to improve the understanding capability. The MMIL intermediate language is a high level semantic formalism that bears relevant linguistic information, from syntax up to discourse. This representation should increase the expressivity of the current annotation though at the expense of the annotation process complexity. In this paper we present our first attempt in defining the annotation guidelines for the HLS annotation of the MEDIA corpus and its effect on the annotation process itself, revealed by annotators’ disagreements due to the different levels of hierarchy and the granularity of the features defined in MMIL.
منابع مشابه
Extending MMIL Semantic Representation: Experiments in Dialogue Systems and Semantic Annotation of Corpora
The MultiModal Interface Language formalism (MMIL) is a modalityindependent high-level semantic representation language. It has been used in different projects, related to different domains, and with distinct tasks and interaction modes. MMIL is a metamodel that enables the definition of generic and domain specific descriptors to dialogue management, offering flexibility and high reusability. T...
متن کاملSemantic Frame Annotation on the French MEDIA corpus
This paper introduces a knowledge representation formalism used for annotation of the French MEDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semantic structures from basic semantic constituents us...
متن کاملAn Incremental Architecture for the Semantic Annotation of Dialogue Corpora with High-Level Structures. A case of study for the MEDIA corpus
The semantic annotation of dialogue corpora permits building efficient language understanding applications for supporting enjoyable and effective human-machine interactions. Nevertheless, the annotation process could be costly, time-consuming and complicated, particularly the more expressive is the semantic formalism. In this work, we propose a bootstrapping architecture for the semantic annota...
متن کاملPortability of Semantic Annotations for Fast Development of Dialogue Corpora
Generalization of spoken dialogue systems increases the need for fast development of spoken language understanding modules for semantic tagging of speaker’s turns. Statistical methods are performing well for this task but require large corpora to be trained. Collecting such corpora is expensive in time and human expertise. In this paper we propose a semi-automatic annotation process for fast pr...
متن کاملLeveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora
The PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems’ capa...
متن کامل